Homework 11: Maps

General instructions for all assignments:

  • Use this file as the template for your submission. Delete the unnecessary text (e.g. this text, the problem statements, etc). That said, keep the nicely formatted “Problem 1”, “Problem 2”, “a.”, “b.”, etc
  • Upload a single R Markdown file (named as: [AndrewID]-HW09.Rmd – e.g. “sventura-HW09.Rmd”) to the Homework 09 submission section on Blackboard. You do not need to upload the .html file.
  • The instructor and TAs will run your .Rmd file on their computers. If your .Rmd file does not knit on our computers, you will be automatically be deducted 10 points.
  • Your file should contain the code to answer each question in its own code block. Your code should produce plots/output that will be automatically embedded in the output (.html) file
  • Each answer must be supported by written statements (unless otherwise specified)
  • Include the name of anyone you collaborated with at the top of the assignment
  • Include the style guide you used below under Problem 0


Easy Problems

Problem 0

(4 points)

Organization, Themes, and HTML Output

  1. For all problems in this assignment, organize your output as follows:
  • Use code folding for all code. See here for how to do this.
  • Use a floating table of contents.
  • Suppress all warning messages in your output by using warning = FALSE and message = FALSE in every code block.
  • Use tabs only if you see it fit to do so – this is your choice.
  1. For all problems in this assignment, adhere to the following guidelines for your ggplot theme and use of color:
  • Do not use the default ggplot() color scheme.
  • For any bar chart or histogram, outline the bars (e.g. with color = "black").
  • Do not use both red and green in the same plot, since a large proportion of the population is red-green colorblind.
  • Try to only use three colors (at most) in your themes. In previous assignments, many students are using different colors for the axes, axis ticks, axis labels, graph titles, grid lines, background, etc. This is unnecessary and only serves to make your graphs more difficult to read. Use a more concise color scheme.
  • Make sure you use a white or gray background (preferably light gray if you use gray).
  • Make sure that titles, labels, and axes are in dark colors, so that they contrast well with the light colored background.
  • Only use color when necessary and when it enhances your graph. For example, if you have a univariate bar chart, there’s no need to color the bars different colors, since this is redundant.
  • In general, try to keep your themes (and written answers) professional. Remember, you should treat these assignments as professional reports.
  1. Treat your submission as a formal report:
  • Use complete sentences when answering questions.
  • Answer in the context of the problem.
  • Treat your submission more as a formal “report”, where you are providing details analyses to answer the research questions asked in the problems.
  1. What style guide are you using for this assignment?


Problem 1

(3 points each)

Read

  1. Read this article. Write 1-3 sentences about what you learned from it.

  2. Read this article. Write 1-3 sentences about what you learned from it.

  3. Read the in-depth description of the ggmap package in the short paper by David Kahle and Hadley Wickham here. Write 1-3 sentences about what you learned from it.

  4. Read the article on ggmap here. Which functions can you use to create geographic heat maps?



Problem 2

(2 points each)

Maps with ggmap

Install and load the ggmap package. This package can be used to access maps from Google’s Maps API.

  1. Look at the help documentation for the get_map() function. What does it do? What are the different map sources that can be used in get_map()?

  2. In the help documentation, describe the zoom parameter. Roughly, what would be an appropriate value of this parameter if we wanted to display a square with width 1 mile? (Just a rough estimate is fine; an exact number is not required.)

  3. In the help documentation, what are the different maptype values that can be used? Which of these is unique to Google Maps?

  4. What does the map in the following code block show? Describe it. Explain what each of the parameters in the get_map() and ggmap() functions are doing.

(Note: Before doing this, you may need to install the most updated versions of these packages from GitHub – see commented code below.)

#devtools::install_github('hadley/ggplot2')
#devtools::install_github('thomasp85/ggforce')
#devtools::install_github('thomasp85/ggraph')
#devtools::install_github('slowkow/ggrepel')
library(ggmap)
map_base <- get_map(location = c(lon = -79.944248, lat = 40.4415861),
                    color = "color",
                    source = "google",
                    maptype = "hybrid",
                    zoom = 16)

map_object <- ggmap(map_base,
                    extent = "device",
                    ylab = "Latitude",
                    xlab = "Longitude")
map_object

  1. Recreate the map in part (d). Try changing the zoom parameter to a non-integer value (e.g. 16.5). What happens?

  2. Type class(map_object). What kind of object is your map?



Problem 3

(2 points each)

Finding Latitudes and Longitudes

There are many ways to find latitude and longitude coordinates of specific places. Here’s one easy way:

  1. Go to Google Maps. Type in “times square, nyc” and hit enter. The map should center around New York City. Now, look at the URL in your internet browser. After the @ symbol, the latitude and longitude of the center of the map are displayed (in order). What is the latitude of the map centered on Times Square? What is the longitude?

  2. After the latitude and longitude, the zoom level is displayed (e.g. “17z”). Change this to zoom level 12, and delete any text to the right. This should give you a map that displays most of New York City. Do the latitude/longitude coordinates change when you do this?

  3. Using the code from Problem 3d as a template, create a black and white (color = "bw") map of NYC in R, centered near Times Square, at a zoom level of 12, and with a roadmap map type. Describe the map that is output in R.



Hard Problems

For both problems below, it may help to look over:

  • The Lecture 20 R Demo on Blackboard
  • The Maps Helper R Code file on Blackboard (thanks to TA Jennifer Jin for this!)

Both files are located under Course Content.

Problem 4

(30 points total)

Mapping US Flights

  1. (5 points) Load the airports and routes datasets from the Lecture 20 R Demo on Blackboard. Use ggmap to create a map centered on the continental United States. Add points corresponding to the location of each airport, sized by the number of arriving flights at that airport.

  2. (15 points) Recreate your plot in (a). This time, manipulate the routes and airports datasets so that you can use geom_segment() to draw a line connecting each airport for each flight listed in the routes dataset. That is, draw a line that connects the departing airport and the arrival airport. Do this only for flights that either depart from or arrive at the Los Angeles airport (LAX).

  3. (5 points) Recreate your plot from (b). Change the lines on your plot so that they are now curved arrows indicating the start and end points of each flight route. See the help documentation for geom_segment() here for how to do this. Do this only for flights that either depart from or arrive at the Los Angeles airport (LAX).

  4. (5 points) Calculate the change in time zones for each flight. When doing this, New York City to Los Angeles should be +3, and Los Angeles to New York City should be -3. Recreate your graph in (c). This time, color the lines by the change in time zones of each flight. Use a color gradient to do this. Be sure to include a detailed legend. Do this only for flights that either depart from or arrive at the Los Angeles airport (LAX).



Problem 5

(40 points)

Choropleth Maps of Rent Prices

  1. (2 points) Load the Zillow rent prices dataset from Kaggle, located on the course GitHub page.

  2. (20 points) Create a choropleth map showing the average rent prices (in January 2017) at the state level. Use an appropriate three-color gradient. Describe the graph, pointing out any interesting features of the spatial distribution of rent prices in January 2017.

library(tidyverse)
## Loading tidyverse: tibble
## Loading tidyverse: tidyr
## Loading tidyverse: readr
## Loading tidyverse: purrr
## Loading tidyverse: dplyr
## Conflicts with tidy packages ----------------------------------------------
## filter(): dplyr, stats
## lag():    dplyr, stats
library(ggmap)
rent <- read_csv("https://raw.githubusercontent.com/sventura/315-code-and-datasets/master/data/price.csv") %>%
  select(County, State, `January 2017`) %>%
  rename(jan_2017 = `January 2017`) %>%
  arrange(State) %>%
  group_by(State) %>%
  summarize(mean_rent = mean(jan_2017))
## Parsed with column specification:
## cols(
##   .default = col_integer(),
##   City = col_character(),
##   Metro = col_character(),
##   County = col_character(),
##   State = col_character()
## )
## See spec(...) for full column specifications.
state_data <- data_frame(state.abb, state.name) %>%
  mutate(state.name = tolower(state.name)) %>%
  left_join(rent, by = c("state.abb" = "State"))

state_borders <- map_data("state") %>%
  left_join(state_data, by = c("region" = "state.name"))
## 
## Attaching package: 'maps'
## The following object is masked from 'package:purrr':
## 
##     map
ggplot(state_borders, aes(x = long, y = lat, fill = mean_rent)) + 
  geom_polygon(aes(group = state.abb), color = "black") + 
  theme_void() + 
  coord_map("polyconic") + 
  scale_fill_gradient2(high = "darkred", low = "darkblue", 
                       mid = "white", midpoint = 1500)

rent <- read_csv("https://raw.githubusercontent.com/sventura/315-code-and-datasets/master/data/price.csv") %>%
  select(County, State, `January 2017`) %>%
  rename(jan_2017 = `January 2017`) %>%
  group_by(County) %>%
  summarize(mean_rent = mean(jan_2017)) %>%
  mutate(County = tolower(County))
## Parsed with column specification:
## cols(
##   .default = col_integer(),
##   City = col_character(),
##   Metro = col_character(),
##   County = col_character(),
##   State = col_character()
## )
## See spec(...) for full column specifications.
county_borders <- map_data("county") %>%
  left_join(rent, by = c("subregion" = "County"))

ggplot(county_borders, aes(x = long, y = lat, fill = mean_rent)) + 
  geom_polygon(aes(group = group), ) + 
  theme_void() + 
  coord_map("polyconic") + 
  scale_fill_gradient2(high = "darkred", low = "darkblue", 
                       mid = "white", midpoint = 2000)

  1. (18 points) Create a choropleth map showing the average rent prices (in January 2017) at the county level. Use an appropriate three-color gradient. Describe the graph, pointing out any interesting features of the spatial distribution of rent prices in January 2017.

Hint: For parts (b) and (c) above, it may help to use the following:

  • the following functions from the dplyr package (part of tidyverse): group_by(), summarize(), and left_join()
  • the state.abb and state.name objects, e.g. with state_data <- data_frame(state.abb, state.name)
  • the tolower() function
  • the Lecture 20 R Demo

Specifically, for the state-level graph, you’ll want to:

  • Group the rent data by state, then summarize it on the average January 2017 rent price, so that you have a new dataset with one row per state
  • Connect the state abbreviations to their appropriate state name
  • Get the state_borders data (see Lecture 20 R Demo)
  • Connect the state names in state_borders to the state names from your summarized rent dataset and add the average January 2017 rent prices to the state_borders dataset
  • Plot it like we did in Lecture 20
  • Use a similar process for the county-level graph


Bonus Problems

See the BonusProblems assignment on Blackboard.